Survival Analysis

Overview

Survival analysis studies time until an event occurs, handling censored data where events haven't happened for some subjects, enabling prediction of lifetimes and risk assessment.

Key Concepts

Survival Time

Time until event

Censoring

Event not observed (subject dropped out)

Hazard

Instantaneous risk at time t

Survival Curve

Probability of surviving past time t

Hazard Ratio

Relative risk between groups

Common Models

Kaplan-Meier

Non-parametric survival curves

Cox Proportional Hazards

Semi-parametric regression

Weibull/Exponential

Parametric models

Log-rank Test

Comparing survival curves
Competing Risks: Multiple event types Implementation with Python import pandas as pd import numpy as np import matplotlib . pyplot as plt import seaborn as sns from lifelines import KaplanMeierFitter , CoxPHFitter , WeibullAFTFitter from lifelines . statistics import logrank_test import warnings warnings . filterwarnings ( 'ignore' )

Generate sample survival data

np . random . seed ( 42 ) n_patients = 200

Time to event (in months)

event_times

np . random . exponential ( scale = 24 , size = n_patients )

Censoring indicator (1 = event occurred, 0 = censored)

event_observed

np . random . binomial ( 1 , 0.7 , n_patients )

Group assignment (0 = control, 1 = treatment)

group

np . random . binomial ( 1 , 0.5 , n_patients )

Age at baseline

age

np . random . uniform ( 30 , 80 , n_patients )

Risk score

risk_score

np . random . uniform ( 0 , 100 , n_patients )

Adjust event times based on group (simulate treatment effect)

event_times

event_times * ( 1 + group * 0.3 ) df = pd . DataFrame ( { 'time' : event_times , 'event' : event_observed , 'group' : group , 'age' : age , 'risk_score' : risk_score , } ) print ( "Survival Data Summary:" ) print ( df . head ( 10 ) ) print ( f"\nTotal subjects: { len ( df ) } " ) print ( f"Events: { df [ 'event' ] . sum ( ) } ( { df [ 'event' ] . sum ( ) / len ( df ) * 100 : .1f } %)" ) print ( f"Censored: { ( 1 - df [ 'event' ] ) . sum ( ) } ( { ( 1 - df [ 'event' ] ) . sum ( ) / len ( df ) * 100 : .1f } %)" )

1. Kaplan-Meier Estimation

kmf

KaplanMeierFitter ( ) kmf . fit ( df [ 'time' ] , df [ 'event' ] , label = 'Overall' ) print ( "\n1. Kaplan-Meier Survival Estimates:" ) print ( f"Median survival time: { kmf . median_survival_time_ : .1f } months" ) print ( f"6-month survival: { kmf . predict ( 6 ) : .1% } " ) print ( f"12-month survival: { kmf . predict ( 12 ) : .1% } " ) print ( f"24-month survival: { kmf . predict ( 24 ) : .1% } " )

2. Group Comparison

fig , axes = plt . subplots ( 2 , 2 , figsize = ( 14 , 10 ) )

Overall survival curve

ax

axes [ 0 , 0 ] kmf . plot_survival_function ( ax = ax , linewidth = 2 ) ax . set_xlabel ( 'Time (months)' ) ax . set_ylabel ( 'Survival Probability' ) ax . set_title ( 'Kaplan-Meier Survival Curve (Overall)' ) ax . grid ( True , alpha = 0.3 )

Survival curves by group

ax

axes [ 0 , 1 ] for group_val in [ 0 , 1 ] : mask = df [ 'group' ] == group_val kmf . fit ( df [ mask ] [ 'time' ] , df [ mask ] [ 'event' ] , label = f' { "Control" if group_val == 0 else "Treatment" } ' ) kmf . plot_survival_function ( ax = ax , linewidth = 2 ) ax . set_xlabel ( 'Time (months)' ) ax . set_ylabel ( 'Survival Probability' ) ax . set_title ( 'Kaplan-Meier Curves by Group' ) ax . grid ( True , alpha = 0.3 )

3. Log-Rank Test

mask_control

df [ 'group' ] == 0 mask_treatment = df [ 'group' ] == 1 results = logrank_test ( df [ mask_control ] [ 'time' ] , df [ mask_treatment ] [ 'time' ] , df [ mask_control ] [ 'event' ] , df [ mask_treatment ] [ 'event' ] ) print ( f"\n3. Log-Rank Test:" ) print ( f"Test statistic: { results . test_statistic : .4f } " ) print ( f"P-value: { results . p_value : .4f } " ) print ( f"Significant: { 'Yes' if results . p_value < 0.05 else 'No' } " )

4. Risk Groups (by quartiles)

df [ 'risk_quartile' ] = pd . qcut ( df [ 'risk_score' ] , q = 4 , labels = [ 'Low' , 'Medium-Low' , 'Medium-High' , 'High' ] ) ax = axes [ 1 , 0 ] for risk_group in [ 'Low' , 'Medium-Low' , 'Medium-High' , 'High' ] : mask = df [ 'risk_quartile' ] == risk_group kmf . fit ( df [ mask ] [ 'time' ] , df [ mask ] [ 'event' ] , label = risk_group ) kmf . plot_survival_function ( ax = ax , linewidth = 2 ) ax . set_xlabel ( 'Time (months)' ) ax . set_ylabel ( 'Survival Probability' ) ax . set_title ( 'Kaplan-Meier Curves by Risk Quartile' ) ax . legend ( ) ax . grid ( True , alpha = 0.3 )

5. Cumulative Hazard

ax

axes [ 1 , 1 ] kmf . fit ( df [ 'time' ] , df [ 'event' ] ) kmf . plot_cumulative_density ( ax = ax , linewidth = 2 ) ax . set_xlabel ( 'Time (months)' ) ax . set_ylabel ( 'Cumulative Event Probability' ) ax . set_title ( 'Cumulative Event Probability' ) ax . grid ( True , alpha = 0.3 ) plt . tight_layout ( ) plt . show ( )

6. Cox Proportional Hazards Model

cph

CoxPHFitter ( ) cph . fit ( df [ [ 'time' , 'event' , 'group' , 'age' , 'risk_score' ] ] , duration_col = 'time' , event_col = 'event' ) print ( f"\n6. Cox Proportional Hazards Model:" ) print ( cph . summary )

Hazard ratios

print ( f"\nHazard Ratios:" ) for var in [ 'group' , 'age' , 'risk_score' ] : hr = np . exp ( cph . params_ [ var ] ) print ( f" { var } : { hr : .3f } " )

7. Model Diagnostics

fig , axes = plt . subplots ( 2 , 2 , figsize = ( 14 , 10 ) )

Partial effects plot

ax

axes [ 0 , 0 ] df_partial = df . copy ( ) df_partial [ 'partial_hazard' ] = cph . predict_partial_hazard ( df_partial ) for group_val in [ 0 , 1 ] : mask = df_partial [ 'group' ] == group_val ax . scatter ( df_partial [ mask ] [ 'risk_score' ] , df_partial [ mask ] [ 'partial_hazard' ] , alpha = 0.6 , label = f' { "Control" if group_val == 0 else "Treatment" } ' ) ax . set_xlabel ( 'Risk Score' ) ax . set_ylabel ( 'Partial Hazard' ) ax . set_title ( 'Partial Hazard by Risk Score and Group' ) ax . legend ( ) ax . grid ( True , alpha = 0.3 )

Concordance index over time

ax

axes [ 0 , 1 ] concordance_index = cph . concordance_index_ ax . text ( 0.5 , 0.5 , f'Concordance Index: { concordance_index : .3f } ' , ha = 'center' , va = 'center' , fontsize = 14 , bbox = dict ( boxstyle = 'round' , facecolor = 'lightblue' , alpha = 0.7 ) ) ax . axis ( 'off' ) ax . set_title ( 'Model Performance' )

Survival curves by predicted risk

ax

axes [ 1 , 0 ] df [ 'predicted_hazard' ] = cph . predict_partial_hazard ( df ) df [ 'hazard_quartile' ] = pd . qcut ( df [ 'predicted_hazard' ] , q = 4 , labels = [ 'Low' , 'Medium-Low' , 'Medium-High' , 'High' ] ) for hazard_group in [ 'Low' , 'Medium-Low' , 'Medium-High' , 'High' ] : mask = df [ 'hazard_quartile' ] == hazard_group kmf . fit ( df [ mask ] [ 'time' ] , df [ mask ] [ 'event' ] , label = hazard_group ) kmf . plot_survival_function ( ax = ax , linewidth = 2 ) ax . set_xlabel ( 'Time (months)' ) ax . set_ylabel ( 'Survival Probability' ) ax . set_title ( 'Survival by Predicted Risk Quartile' ) ax . grid ( True , alpha = 0.3 )

Variable importance

ax

axes [ 1 , 1 ] coef_df = cph . summary [ [ 'coef' , 'exp(coef)' ] ] . copy ( ) coef_df = coef_df . sort_values ( 'coef' ) colors = [ 'red' if x < 0 else 'green' for x in coef_df [ 'coef' ] ] ax . barh ( coef_df . index , coef_df [ 'coef' ] , color = colors , alpha = 0.7 , edgecolor = 'black' ) ax . set_xlabel ( 'Coefficient' ) ax . set_title ( 'Variable Coefficients' ) ax . axvline ( x = 0 , color = 'black' , linestyle = '-' , linewidth = 0.8 ) ax . grid ( True , alpha = 0.3 , axis = 'x' ) plt . tight_layout ( ) plt . show ( )

8. Survival Prediction

new_patient

pd . DataFrame ( { 'group' : [ 1 ] , 'age' : [ 65 ] , 'risk_score' : [ 75 ] , } ) survival_prob = cph . predict_survival_function ( new_patient , times = [ 6 , 12 , 24 ] ) print ( f"\n8. Survival Prediction for New Patient (age 65, treatment, risk 75):" ) print ( f"6-month survival: { survival_prob . iloc [ 0 , 0 ] : .1% } " ) print ( f"12-month survival: { survival_prob . iloc [ 1 , 0 ] : .1% } " ) print ( f"24-month survival: { survival_prob . iloc [ 2 , 0 ] : .1% } " )

9. Proportional Hazards Assumption

print ( f"\n9. Proportional Hazards Test:" ) from lifelines . statistics import proportional_hazard_assumption ph_test = proportional_hazard_assumption ( cph , df [ [ 'time' , 'event' , 'group' , 'age' , 'risk_score' ] ] , time_transform = 'rank' ) print ( ph_test )

10. Summary Statistics

print

(

f"\n"

+

"="

*

50

)

print

(

"SURVIVAL ANALYSIS SUMMARY"

)

print

(

"="

*

50

)

print

(

f"Control median survival:

{

df

[

df

[

'group'

]

==

0

]

[

'time'

]

.

median

(

)

:

.1f

}

months"

)

print

(

f"Treatment median survival:

{

df

[

df

[

'group'

]

==

1

]

[

'time'

]

.

median

(

)

:

.1f

}

months"

)

print

(

f"Log-rank p-value:

{

results

.

p_value

:

.4f

}

"

)

print

(

f"Concordance index:

{

concordance_index

:

.3f

}

"

)

print

(

"="

*

50

)

Censoring Types

Right censoring

Event hasn't occurred (most common)

Left censoring

Event occurred before observation

Interval censoring

Event in unknown time interval

Model Comparison

Kaplan-Meier

Describes, doesn't explain

Cox Model

Adjusts for covariates, proportional hazards

Parametric

Assumes distribution
Competing Risks: Multiple event types Applications Clinical trials Equipment reliability Customer churn Employee retention Product lifetime Deliverables Kaplan-Meier survival curves Survival probability estimates Log-rank test results Cox model coefficients Hazard ratios Risk stratification groups Survival predictions Model diagnostics

安装

Generate sample survival data

Time to event (in months)

event_times

Censoring indicator (1 = event occurred, 0 = censored)

event_observed

Group assignment (0 = control, 1 = treatment)

group

Age at baseline

age

Risk score

risk_score

Adjust event times based on group (simulate treatment effect)

event_times

1. Kaplan-Meier Estimation

kmf

2. Group Comparison

Overall survival curve

ax

Survival curves by group

ax

3. Log-Rank Test

mask_control

4. Risk Groups (by quartiles)

5. Cumulative Hazard

ax

6. Cox Proportional Hazards Model

cph

Hazard ratios

7. Model Diagnostics

Partial effects plot

ax

Concordance index over time

ax

Survival curves by predicted risk

ax

Variable importance

ax

8. Survival Prediction

new_patient

9. Proportional Hazards Assumption

10. Summary Statistics